Less temperamental bwc artifacts check when between versions #7398

peternied · 2023-05-03T19:19:18Z

Description

During the release process there are times when OpenSearch has branches such as 2.x that have increamented their build version number, but that number had not be backported to main. During this time bwc test can fail because the expected artifact version in main is lower than the version from the branch that is built.

This pull request allows for a margin of error between the expected artifacts with acceptable alternative artifacts which match either the next patch version, or the next minor version. This should prevent unrelated changes from being caught by the misfortunate time between version update backports.

Example output if alternative artifacts are found:

 [2.7.0] BUILD SUCCESSFUL in 3s
 [2.7.0] 170 actionable tasks: 3 executed, 167 up-to-date
Building 2.7.0 found acceptable alternative artifact opensearch-min-2.7.1-SNAPSHOT-linux-x64.tar.gz

Related Issues

Resolves [BUG] [Build Break] buildBwcLinuxTar is expecting 2.7.0 artifact but getting 2.7.1 #7396

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
~~Commit changes are listed out in CHANGELOG.md file (See: Changelog)~~

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Changed the error message to include other files in the output directory even if this projectArtifact wasn't found. In this case it looks like 2.7.0 -> 2.7.1 version bump is causing the troubled, unrelated to this pull request. Signed-off-by: Peter Nied <petern@amazon.com>

Signed-off-by: Peter Nied <petern@amazon.com>

reta · 2023-05-03T19:24:28Z

Personally, I think it will hide the problem instead of manifesting it. Example right here: yesterday, for some reason, the autocut bot did not create pull request for main but only for 2.x and 2.7. We noticed that because build started to fail and the fix was sent shortly. With this change - the problem could stay unnoticed for very long, my 5 cents.

github-actions · 2023-05-03T19:28:14Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/14815/
CommitID: 9eaaa8f
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

peternied · 2023-05-03T19:35:58Z

Yesterday, for some reason, the autocut bot did not create pull request for main but only for 2.x and 2.7. We noticed that because build started to fail and the fix was sent shortly.

That doesn't sound good, glad it was caught. There are 95 pull requests in flight, should we be blocking all those contributions because of that issue?

With this change - the problem could stay unnoticed for very long, my 5 cents.

These tests seems to be built to monitor specific aspects of the project. If we were to run them on a daily schedule where maintainers/interested contributors could monitor the results would that be a valuable alternative?

As a contributor to OpenSearch, I have to run and re-run the CI checks so often because of flaky tests or unrelated changes in flight, its discouraging.

Signed-off-by: Peter Nied <petern@amazon.com>

github-actions · 2023-05-03T20:10:12Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/14820/
CommitID: 63dc1d7
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

peternied · 2023-05-03T20:15:02Z

Test failures are legit, they are associated with the version number mismatch during upgrade test. Depending on what others feel like I can update these or not. Details https://build.ci.opensearch.org/job/gradle-check/14820/testReport/

org.opensearch.upgrades.FullClusterRestartIT.testSnapshotRestore

Failing for the past 1 build (Since #14820 )
Took 8.7 sec.
Error Message
java.lang.AssertionError: expected:<[2.7.0]> but was:<[2.7.1]>

reta · 2023-05-03T20:23:59Z

That doesn't sound good, glad it was caught. There are 95 pull requests in flight, should we be blocking all those contributions because of that issue?

These tests seems to be built to monitor specific aspects of the project. If we were to run them on a daily schedule where maintainers/interested contributors could monitor the results would that be a valuable alternative?

The BWC tests are important, any change could basically break them. It sad it happens but it does not happen every day - fixing the automation (autocut bot) would help.

As a contributor to OpenSearch, I have to run and re-run the CI checks so often because of flaky tests or unrelated changes in flight, its discouraging.

We probably should not mix flaky (yes, it is discouraging) and BWC tests - this failure is stable. We release once so often, the pull requests which were green - stay green, no action required. The pull requests which are in work (and are failing checks) should be rebased / merged with target branch (that window would have been minimized if automation worked out well), this is the good practice anyway since upstream is a moving target.

saratvemulapalli · 2023-05-03T20:50:30Z

+1 to @reta. I am afraid we'll kick the can down the road and it will silently wait until the next release.
This mechanism enforces quality while trading for more dev cycles which is fair in my opinion.

dblock · 2023-05-03T20:57:46Z

I think AUTOCUT bot isn't the problem, the build failed and it told us. Is there (there should be) a way to make version increment PRs in such an order that doesn't ever break bwc tests? For example, if we added 2.7.1 on main, first?

reta · 2023-05-03T21:10:58Z

I think AUTOCUT bot isn't the problem, the build failed and it told us.

Sorry, I haven't seen that, mind sharing the link? Because #7370 and #7371 were right on time.

Is there (there should be) a way to make version increment PRs in such an order that doesn't ever break bwc tests? For example, if we added 2.7.1 on main, first?

Not sure, I believe we build by checking out branches (2.7 / 2.x /... ), needs some research

peternied · 2023-05-03T21:29:42Z

It looks like those change to increment the version follow the normal pull request process. I think we could make it faster. What if instead of waiting on human approval / CI results we immediately pushed the changes to all branches.

We have nearly all the automation to do this via GitHub action, how would y'all feel about changing the workflow to do it all at once?

reta · 2023-05-03T21:53:52Z

It looks like those change to increment the version follow the normal pull request process. I think we could make it faster. What if instead of waiting on human approval / CI results we immediately pushed the changes to all branches.

We have nearly all the automation to do this via GitHub action, how would y'all feel about changing the workflow to do it all at once?

I was also wondering if we could do something like "multi branch commit" (sadly not a thing in git), but in general, I think it may work - push to respective branch bypassing pull request phase.

kotwanikunal · 2023-05-03T21:56:43Z

I think AUTOCUT bot isn't the problem, the build failed and it told us. Is there (there should be) a way to make version increment PRs in such an order that doesn't ever break bwc tests? For example, if we added 2.7.1 on main, first?

I think the bot has some issues since it did not automatically raise the PR for main, which IIRC it used to.

peternied · 2023-05-03T22:06:33Z

Adding bwc version 2.7.1 after 2.7.0
Adding V_2_7_1 after V_2_7_0
sed: can't read server/src/main/java/org/opensearch/Version.java: No such file or directory
Error: Process completed with exit code 2.

https://github.com/opensearch-project/OpenSearch/actions/runs/4865699230/jobs/8676359444

I'll start up another PR

[BUG] version update GitHub workflow is broken #7411

This change alters the version workflow to create a new issue, then create several commits linked back to that issue and finally close the issue when all commits are completed for tracability of version updates. Background As version updates have different propigation speeds depending on pull request reviews, consistant CI in many branches, and an careful order of execution this would cause failures outside the scope of changes that would block changes into main and other branches. This change removes the use of pull requests and the standard ci gates. To ensure that the changes operate consistantly created a GitHub Action to manage the file modification peternied/opensearch-core-version-updater See an example of this workflows results with #66 Related issues - Resolves opensearch-project#7411 - Related Discussion opensearch-project#7398 - Resolves opensearch-project#7396 Signed-off-by: Peter Nied <petern@amazon.com>

peternied · 2023-05-05T05:37:04Z

I don't think it makes sense to move forward with this pull request as is. Reducing the temperament seems more likely to cause issues to linger instead of being addressed.

In place of this change, I've created another pull request to address the root cause which is the delay between updates in different branches.

Rewrite version update GHA to commit and push #7442

peternied added 3 commits May 3, 2023 18:38

Add alternative dit files

4bb4847

Signed-off-by: Peter Nied <petern@amazon.com>

Accept alternative artifacts

9eaaa8f

Signed-off-by: Peter Nied <petern@amazon.com>

peternied requested review from reta, anasalkouz, andrross, Bukhtawar, CEHENKLE, dblock, gbbafna, setiah, kartg, kotwanikunal, mch2, nknize, owaiskazi19, Rishikesh1159, ryanbogan, saratvemulapalli, shwetathareja, dreamer-89, tlfeng and VachaShah as code owners May 3, 2023 19:19

peternied added the skip-changelog label May 3, 2023

Spotless style fixes

63dc1d7

Signed-off-by: Peter Nied <petern@amazon.com>

peternied mentioned this pull request May 3, 2023

[BUG] [Build Break] buildBwcLinuxTar is expecting 2.7.0 artifact but getting 2.7.1 #7396

Open

peternied mentioned this pull request May 5, 2023

Rewrite version update GHA to commit and push #7442

Closed

11 tasks

peternied closed this May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less temperamental bwc artifacts check when between versions #7398

Less temperamental bwc artifacts check when between versions #7398

peternied commented May 3, 2023

reta commented May 3, 2023

github-actions bot commented May 3, 2023

peternied commented May 3, 2023

github-actions bot commented May 3, 2023

peternied commented May 3, 2023

reta commented May 3, 2023 •

edited

Loading

saratvemulapalli commented May 3, 2023

dblock commented May 3, 2023 •

edited

Loading

reta commented May 3, 2023

peternied commented May 3, 2023

reta commented May 3, 2023

kotwanikunal commented May 3, 2023

peternied commented May 3, 2023 •

edited

Loading

peternied commented May 5, 2023

Less temperamental bwc artifacts check when between versions #7398

Less temperamental bwc artifacts check when between versions #7398

Conversation

peternied commented May 3, 2023

Description

Related Issues

Check List

reta commented May 3, 2023

github-actions bot commented May 3, 2023

Gradle Check (Jenkins) Run Completed with:

peternied commented May 3, 2023

github-actions bot commented May 3, 2023

Gradle Check (Jenkins) Run Completed with:

peternied commented May 3, 2023

reta commented May 3, 2023 • edited Loading

saratvemulapalli commented May 3, 2023

dblock commented May 3, 2023 • edited Loading

reta commented May 3, 2023

peternied commented May 3, 2023

reta commented May 3, 2023

kotwanikunal commented May 3, 2023

peternied commented May 3, 2023 • edited Loading

peternied commented May 5, 2023

reta commented May 3, 2023 •

edited

Loading

dblock commented May 3, 2023 •

edited

Loading

peternied commented May 3, 2023 •

edited

Loading